25 research outputs found

    Improving time predictability of shared hardware resources in real-time multicore systems : emphasis on the space domain

    Get PDF
    Critical Real-Time Embedded Systems (CRTES) follow a verification and validation process on the timing and functional correctness. This process includes the timing analysis that provides Worst-Case Execution Time (WCET) estimates to provide evidence that the execution time of the system, or parts of it, remain within the deadlines. A key design principle for CRTES is the incremental qualification, whereby each software component can be subject to verification and validation independently of any other component, with obvious benefits for cost. At timing level, this requires time composability, such that the timing behavior of a function is not affected by other functions. CRTES are experiencing an unprecedented growth with rising performance demands that have motivated the use of multicore architectures. Multicores can provide the performance required and bring the potential of integrating several software functions onto the same hardware. However, multicore contention in the access to shared hardware resources creates a dependence of the execution time of a task with the rest of the tasks running simultaneously. This dependence threatens time predictability and jeopardizes time composability. In this thesis we analyze and propose hardware solutions to be applied on current multicore designs for CRTES to improve time predictability and time composability, focusing on the on-chip bus and the memory controller. At hardware level, we propose new bus and memory controller designs that control and mitigate contention between different cores and allow to have time composability by design, also in the context of mixed-criticality systems. At analysis level, we propose contention prediction models that factor the impact of contenders and don¿t need modifications to the hardware. We also propose a set of Performance Monitoring Counters (PMC) that provide evidence about the contention. We give an special emphasis on the Space domain focusing on the Cobham Gaisler NGMP multicore processor, which is currently assessed by the European Space Agency for its future missions.Los Sistemas Críticos Empotrados de Tiempo Real (CRTES) siguen un proceso de verificación y validación para su correctitud funcional y temporal. Este proceso incluye el análisis temporal que proporciona estimaciones de el peor caso del tiempo de ejecución (WCET) para dar evidencia de que el tiempo de ejecución del sistema, o partes de él, permanecen dentro de los límites temporales. Un principio de diseño clave para los CRTES es la cualificación incremental, por la que cada componente de software puede ser verificado y validado independientemente del resto de componentes, con beneficios obvios para el coste. A nivel temporal, esto requiere composabilidad temporal, por la que el comportamiento temporal de una función no se ve afectado por otras funciones. CRTES están experimentando un crecimiento sin precedentes con crecientes demandas de rendimiento que han motivado el uso the arquitecturas multi-núcleo (multicore). Los procesadores multi-núcleo pueden proporcionar el rendimiento requerido y tienen el potencial de integrar varias funcionalidades software en el mismo hardware. A pesar de ello, la interferencia entre los diferentes núcleos que aparece en los recursos compartidos de os procesadores multi núcleo crea una dependencia del tiempo de ejecución de una tarea con el resto de tareas ejecutándose simultáneamente en el procesador. Esta dependencia amenaza la predictabilidad temporal y compromete la composabilidad temporal. En esta tésis analizamos y proponemos soluciones hardware para ser aplicadas en los diseños multi núcleo actuales para CRTES que mejoran la predictabilidad y composabilidad temporal, centrándose en el bus y el controlador de memoria internos al chip. A nivel de hardware, proponemos nuevos diseños de buses y controladores de memoria que controlan y mitigan la interferencia entre los diferentes núcleos y permiten tener composabilidad temporal por diseño, también en el contexto de sistemas de criticalidad mixta. A nivel de análisis, proponemos modelos de predicción de la interferencia que factorizan el impacto de los núcleos y no necesitan modificaciones hardware. También proponemos un conjunto de Contadores de Control del Rendimiento (PMC) que proporcionoan evidencia de la interferencia. En esta tésis, damós especial importancia al dominio espacial, centrándonos en el procesador mutli núcleo Cobham Gaisler NGMP, que está siendo actualmente evaluado por la Agencia Espacial Europea para sus futuras misiones

    Improving early design stage timing modeling in multicore based real-time systems

    Get PDF
    This paper presents a modelling approach for the timing behavior of real-time embedded systems (RTES) in early design phases. The model focuses on multicore processors - accepted as the next computing platform for RTES - and in particular it predicts the contention tasks suffer in the access to multicore on-chip shared resources. The model presents the key properties of not requiring the application's source code or binary and having high-accuracy and low overhead. The former is of paramount importance in those common scenarios in which several software suppliers work in parallel implementing different applications for a system integrator, subject to different intellectual property (IP) constraints. Our model helps reducing the risk of exceeding the assigned budgets for each application in late design stages and its associated costs.This work has received funding from the European Space Agency under Project Reference AO=17722=13=NL=LvH, and has also been supported by the Spanish Ministry of Science and Innovation grant TIN2015-65316-P. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Validating a timing simulator for the NGMP multicore processor

    Get PDF
    Timing simulation is a key element in multicore systems design. It enables a fast and cost effective design space exploration, allowing to simulate new architectural improvements without requiring RTL abstraction levels. Timing simulation also allows software developers to perform early testing of the timing behavior of their software without the need of buying the actual physical board, which can be very expensive when the board uses non-COTS technology. In this paper we present the validation of a timing simulator for the NGMP multicore processor, which is a 4 core processor being developed to become the reference platform for future missions of the European Space Agency.The research leading to these results has received funding from the European Space Agency under contract NPI 4000102880 and the Ministry of Science and Technology of Spain under contract TIN-2015-65316-P. Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Computing Safe Contention Bounds for Multicore Resources with Round-Robin and FIFO Arbitration

    Get PDF
    Numerous researchers have studied the contention that arises among tasks running in parallel on a multicore processor. Most of those studies seek to derive a tight and sound upper-bound for the worst-case delay with which a processor resource may serve an incoming request, when its access is arbitrated using time-predictable policies such as round-robin or FIFO. We call this value upper-bound delay ( ubd ). Deriving trustworthy ubd statically is possible when sufficient public information exists on the timing latency incurred on access to the resource of interest. Unfortunately however, that is rarely granted for commercial-of-the-shelf (COTS) processors. Therefore, the users resort to measurement observations on the target processor and thus compute a “measured” ubdm . However, using ubdm to compute worst-case execution time values for programs running on COTS multicore processors requires qualification on the soundness of the result. In this paper, we present a measurement-based methodology to derive a ubdm under round-robin (RoRo) and first-in-first-out (FIFO) arbitration, which accurately approximates ubd from above, without needing latency information from the hardware provider. Experimental results, obtained on multiple processor configurations, demonstrate the robustness of the proposed methodology.The research leading to this work has received funding from: the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644080(SAFURE); the European Space Agency under Contract 789.2013 and NPI Contract 40001102880; and COST Action IC1202, Timing Analysis On Code-Level (TACLe). This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. The authors would like to thanks Paul Caheny for his help with the proofreading of this document.Peer ReviewedPostprint (author's final draft

    Contention-aware performance monitoring counter support for real-time MPSoCs

    Get PDF
    Tasks running in MPSoCs experience contention delays when accessing MPSoC’s shared resources, complicating task timing analysis and deriving execution time bounds. Understanding the Actual Contention Delay (ACD) each task suffers due to other corunning tasks, and the particular hardware shared resources in which contention occurs, is of prominent importance to increase confidence on derived execution time bounds of tasks. And, whenever those bounds are violated, ACD provides information on the reasons for overruns. Unfortunately, existing MPSoC designs considered in real-time domains offer limited hardware support to measure tasks’ ACD losing all these potential benefits. In this paper we propose the Contention Cycle Stack (CCS), a mechanism that extends performance monitoring counters to track specific events that allow estimating the ACD that each task suffers from every contending task on every hardware shared resource. We build the CCS using a set of specialized low-overhead Performance Monitoring Counters for the Cobham Gaisler GR740 (NGMP) MPSoC – used in the space domain – for which we show CCS’s benefits.The research leading to these results has received funding from the European Space Agency under contracts 4000109680, 4000110157 and NPI 4000102880, and the Ministry of Science and Technology of Spain under contract TIN-2015-65316-P. Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Estimación de canales de banda ultra ancha (UWB) mediante el uso de filtros de Laguerre

    Get PDF
    Por estimación de canales entendemos la aproximación de la respuesta impulsional discreta de un canal desconocido usando el criterio de mínimo error cuadrático medio (MSE). En un enfoque clásico, esta aproximación es obtenida ajustando de manera adaptativa los coeficientes de un filtro FIR transversal, también conocido como 'entrenamiento'. La estimación de canales en canales de banda ultra ancha (UWB) tiene que hacer frente a respuestas impulsionales largas debido a la alta frecuencia de muestreo y usualmente la energía está concentrada en una pequeña fracción de los intervalos de tiempo. Usando un filtro transversal FIR, se requiere un orden elevado para poder estimar de manera correcta la larga respuesta impulsional y la mayoría de los coeficientes del filtro reúnen poca o ninguna energía debido a la concentración de la energía en pequeñas fracciones de intervalos de tiempo. La fuente de este problema viene del hecho de que una aproximación consiste en representar una función como una suma ponderada de una base ortonormal completa (también llamadas funciones base), y luego truncar esta suma a un número fijo de términos (equivalente al orden del filtro). En el enfoque clásico del filtro FIR transversal, las funciones base se corresponden con la delta de Kronecker, también conocida como la base canónica. El problema con los canales UWB surgen debido a la extensión temporal extremadamente corta de las funciones base de los filtros FIR. Este estudio es una aproximación teórica para caracterizar la viabilidad de los filtros de Laguerre en un sistema de comunicación UWB. La aproximación es llevada a cabo por medio de otra base ortonormal, las secuencias de Laguerre, las cuales forman un compromiso entre los sistemas FIR y los IIR; y además pueden ser consideradas como una generalización de los filtros FIR. Estas secuencias son utilizadas para la estimación del canal para varios canales de prueba en el ámbito UWB y para realizaciones del modelo estocástico de canales UWB proporcionado por el estándar IEEE 802.15.4a, mostrando que ofrecen mejor rendimiento, en términos del error cuadrático medio (MSE), como generalización de los filtros FIR, sin embargo esta mejora no es muy grande cuando el canal presenta cambios abruptos, debido al comportamiento paso bajo de los filtros de Laguerre. Métodos adaptativos, como el algoritmo RLS son utilizados para caracterizar de forma práctica estas aproximaciones y verificar los resultados obtenidos teóricamente

    A dualcriticality memory controler (DCmc): Proposal and evaluation of a space case study

    Get PDF
    Abstract-Multicore Dual-Criticality systems comprise two types of applications, each with a different criticality level. In the space domain these types are referred as payload and control applications, which have high-performance and realtime requirements respectively. In order to control the interaction (contention) among payload and control applications in the access to the main memory, reaching the goals of highbandwidth for the former and guaranteed timing bounds for the latter, we propose a Dual-Criticality memory controller (DCmc). DCmc virtually divides memory banks into real-time and high-performance banks, deploying a different request scheduler policy to each bank type, which facilitates achieving both goals. Our evaluation with a multicore cycle-accurate simulator and a real space case study shows that DCmc enables deriving tight WCET estimates, regardless of the co-running payload applications, hence effectively isolating the effect of contention in the access to memory. DCmc also enables payload applications exploiting memory locality, which is needed for high performance

    AHRB: A High-Performance Time-Composable AMBA AHB Bus

    Get PDF
    Abstract-Hard real-time systems are moving toward complex systems comprising chips with different IP components connected with standard buses. AMBA is one of the most used bus interfaces and has already been included in processors in the real-time domain. However, AMBA was not designed to provide time composable Worst Case Execution Time (WCET) estimates, which are desirable to reduce timing validation and verification costs. This paper analyzes and extends the AMBA Advanced Highperformance Bus (AHB) specification to enable time-composable WCET estimates by design. Concretely, (1) we analyze in detail the AMBA AHB in the context of hard real-time systems proving that it fails to provide time composability; (2) we define a restricted subset of AMBA AHB features, named restricted AHB (resAHB), that allows deriving time-composable, yet not tight, WCET estimates; and (3) we define an extension of resAHB, named Advanced High-performance Real-time Bus (AHRB), that includes the timing constraints in the specification. This allows deriving time-composable and tight WCET estimates. Our results show that AHRB can provide 3.5x tighter estimates than resAHB on average for EEMBC benchmarks

    CleanET: enabling timing validation for complex automotive systems

    Get PDF
    Timing validation for automotive systems occurs in late integration stages when it is hard to control how the instances of software tasks overlap in time. To make things worse, in complex software systems, like those for autonomous driving, tasks schedule has a strong event-driven nature, which further complicates relating those task-overlapping scenarios (TOS) captured during the software timing budgeting and those observed during validation phases. This paper proposes CleanET, an approach to derive the dilation factor r caused due to the simultaneous execution of multiple tasks. To that end, CleanET builds on the captured TOS during testing and predicts how tasks execution time react under untested TOS (e.g. full overlap), hence acting as a mean of robust testing. CleanET also provides additional evidence for certification about the derived timing budgets for every task. We apply CleanET to a commercial autonomous driving framework, Apollo, where task measurements can only be reasonably collected under 'arbitrary' TOS. Our results show that CleanET successfully derives the dilation factor and allows assessing whether execution times for the different tasks adhere to their respective deadlines for unobserved scenarios.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015- 65316-P, the SuPerCom European Research Council (ERC) project under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773), and the HiPEAC Network of Excellence. MINECO partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft

    Data Bus Slicing for Contention-Free Multicore Real-Time Memory Systems

    No full text
    Memory access contention is one of the main contributors to tasks' execution time variability in real-Time multicores. Existing techniques to control memory contention based on time-sharing memory access do not scale well with increasing complexity of multicores, leading to a rapid increase of WCET estimates. This is due to fact that requests from different tasks interleave in the access to memory, and for each of its requests a task has to make worstcase time allowances to account for the memory state left by the previous request, that may belong to a different task. In this paper, we propose a memory organization that controls contention by dividing the data bus into narrower independent data buses, thus removing conflicts among different tasks accessing memory. While narrower data buses require extra transfers, they allow exploiting memory locality, hence only slightly affecting average performance. Our evaluation on a solid space case-study shows that the proposed memory organization provides contention-free memory access facilitating timing analysis and tightening WCET estimates.The research leading to these results has received funding from the European Space Agency under contract NPI 4000102880 and the Ministry of Science and Technology of Spain under contract TIN-2015-65316-P. Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft